10 research outputs found

    Spatio-temporal video autoencoder with differentiable memory

    Get PDF
    We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time. Here we target motion changes and use as temporal decoder a robust optical flow prediction module together with an image sampler serving as built-in feedback loop. The architecture is end-to-end differentiable. At each time step, the system receives as input a video frame, predicts the optical flow based on the current observation and the LSTM memory state as a dense transformation map, and applies it to the current frame to generate the next frame. By minimising the reconstruction error between the predicted next frame and the corresponding ground truth next frame, we train the whole system to extract features useful for motion estimation without any supervision effort. We present one direct application of the proposed framework in weakly-supervised semantic segmentation of videos through label propagation using optical flow

    Joint A Contrario Ellipse and Line Detection.

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/TPAMI.2016.2558150We propose a line segment and elliptical arc detector that produces a reduced number of false detections on various types of images without any parameter tuning. For a given region of pixels in a grey-scale image, the detector decides whether a line segment or an elliptical arc is present (model validation). If both interpretations are possible for the same region, the detector chooses the one that best explains the data (model selection ). We describe a statistical criterion based on the a contrario theory, which serves for both validation and model selection. The experimental results highlight the performance of the proposed approach compared to state-of-the-art detectors, when applied on synthetic and real images.This work was partially funded by the Qualcomm postdoctoral program at École Polytechnique Palaiseau, a Google Faculty Research Award, the Marie Curie grant CIG-334283-HRGP, a CNRS chaire d’excellence and chaire Jean Marjoulet, and EPSRC grant EP/L010917/1

    SceneNet: Understanding Real World Indoor Scenes With Synthetic Data

    Get PDF
    Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data --- performance increases in proportion to the amount of data used. However, this quickly becomes prohibitive when considering the manual labour needed to collect such data. In this work, we focus our attention on depth based semantic per-pixel labelling as a scene understanding problem and show the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes. By carefully synthesizing training data with appropriate noise models we show comparable performance to state-of-the-art RGBD systems on NYUv2 dataset despite using only depth data as input and set a benchmark on depth-based segmentation on SUN RGB-D dataset. Additionally, we offer a route to generating synthesized frame or video data, and understanding of different factors influencing performance gains

    SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes

    Get PDF
    We are interested in automatic scene understanding from geometric cues. To this end, we aim to bring semantic segmentation in the loop of real-time reconstruction. Our semantic segmentation is built on a deep autoencoder stack trained exclusively on synthetic depth data generated from our novel 3D scene library, SynthCam3D. Importantly, our network is able to segment real world scenes without any noise modelling. We present encouraging preliminary results

    Détection et identification de structures elliptiques en images : Paradigme et algorithmes

    No full text
    Cette thèse porte sur différentes problématiques liées à la détection, l'ajustement et l'identification de structures elliptiques en images. Nous plaçons la détection de primitives géométriques dans le cadre statistique des méthodes a contrario afin d'obtenir un détecteur de segments de droites et d'arcs circulaires/elliptiques sans paramètres et capable de contrôler le nombre de fausses détections. Pour améliorer la précision des primitives détectées, une technique analytique simple d'ajustement de coniques est proposée ; elle combine la distance algébrique et l'orientation du gradient. L'identification d'une configuration de cercles coplanaires en images par une signature discriminante demande normalement la rectification Euclidienne du plan contenant les cercles. Nous proposons une technique efficace de calcul de la signature qui s'affranchit de l'étape de rectification ; elle est fondée exclusivement sur des propriétés invariantes du plan projectif, devenant elle même projectivement invarianteThis thesis deals with different aspects concerning the detection, fitting, and identification of elliptical features in digital images. We put the geometric feature detection in the a contrario statistical framework in order to obtain a combined parameter-free line segment, circular/elliptical arc detector, which controls the number of false detections. To improve the accuracy of the detected features, especially in cases of occluded circles/ellipses, a simple closed-form technique for conic fitting is introduced, which merges efficiently the algebraic distance with the gradient orientation. Identifying a configuration of coplanar circles in images through a discriminant signature usually requires the Euclidean reconstruction of the plane containing the circles. We propose an efficient signature computation method that bypasses the Euclidean reconstruction; it relies exclusively on invariant properties of the projective plane, being thus itself invariant under perspectiv

    Scene Structure Inference through Scene Map Estimation

    No full text
    International audienceUnderstanding indoor scene structure from a single RGB image is useful for a wide variety of applications ranging from the editing of scenes to the mining of statistics about space utilization. Most efforts in scene understanding focus on extraction of either dense information such as pixel-level depth or semantic labels, or very sparse information such as bounding boxes obtained through object detection. In this paper we propose the concept of a scene map, a coarse scene representation, which describes the locations of the objects present in the scene from a top-down view (i.e., as they are positioned on the floor), as well as a pipeline to extract such a map from a single RGB image. To this end, we use a synthetic rendering pipeline, which supplies an adapted CNN with virtually unlimited training data. We quantitatively evaluate our results, showing that we clearly outperform a dense baseline approach, and argue that scene maps provide a useful representation for abstract indoor scene understanding